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Abstract 

This paper presents a new functional programming model for graph 
structures called structured graphs. Structured graphs extend con- 
ventional algebraic datatypes with explicit definition and manip- 
ulation of cycles and/or sharing, and offer a practical and conve- 
nient way to program graphs in functional programming languages 
like Haskell. The representation of sharing and cycles (edges) em- 
ploys recursive binders and uses an encoding inspired by para- 
metric higher-order abstract syntax. Unlike traditional approaches 
based on mutable references or node/edge lists, well-formedness of 
the graph structure is ensured statically and reasoning can be done 
with standard functional programming techniques. Since the bind- 
ing structure is generic, we can define many useful generic com- 
binators for manipulating structured graphs. We give applications 
and show how to reason about structured graphs. 

Categories and Subject Descriptors D.3.2 [Programming Lan- 
guages]: Language Classifications — Functional Languages; F.3.3 
[Logics and Meanings of Programs]: Studies of Program Con- 
structs 

General Terms Languages 

Keywords Graphs, parametric HOAS, Haskell. 

1. Introduction 

Functional programming languages, including Haskell [31] and 
ML [29], excel at manipulating tree structures. In those languages 
algebraic datatypes describe the structure of values, and pattern 
matching is used to define functions on such tree structured val- 
ues. These mechanisms provide a high-level declarative program- 
ming model, which avoids explicit manipulation of pointers or ref- 
erences. Additionally algebraic datatypes facilitate reasoning about 
functions using standard proof methods, including structural in- 
duction. 

However, there are many kinds of data that are more natu- 
rally represented as graph structures rather than trees. Some exam- 
ples include: typical compiler construction concerns including con- 
trol/data flow graphs or grammars [2]; entity-relational data mod- 
els [8]; finite state machines; or transitions systems. Sadly, func- 
tional programming languages do not have an equally good answer 
when it comes to manipulating graphs as they do for trees. 
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In impure functional languages, including ML or OCaml [24], 
a combination of algebraic datatypes and mutable references can 
be used to model sharing and cycles. However this requires ex- 
plicit manipulation of mutable references, which precludes many 
of the benefits of functional programming and algebraic datatypes. 
For example observing sharing via pointer/reference comparison 
breaks referential transparency. Even if it is possible to encapsulate 
the use of mutable references under a purely functional interface, 
reasoning about implementations remains challenging [19, 35]. 

In call-by-need functional languages, like Haskell, it is possible 
to construct true cyclic structures. For example: 

ones = 1 : ones 

creates a cyclic list where the head contains the element 1 and the 
tail is a reference to itself. However, sharing is not observable and, 
from a purely semantic perspective, ones is no different from an 
infinite list of l's. A drawback of this approach is that when an 
operation is applied to this list sharing is lost, even if it would be 
possible to preserve sharing. For example, 

twos = map (Ax — > x + 1) ones 

creates an infinite list of 2's instead of a cyclic list with a single 2. 

To deal with the need for observable sharing some researchers 
have proposed approaches that use recursive binders to model cy- 
cles and sharing [14, 15, 20]. For example, ones can be expressed 
using a recursive binder (/i) as follows: 

ones = fi x. (1 : x) 

The idea is to be able to observe and manipulate the binders (/u.) 
and variables (x), making sharing effectively explicit. 

However, several questions need to be answered for this model 
to become effective in practice: 

1. What programming language mechanisms are needed to sup- 
port convenient representation and manipulation of such binders 
and variables? Does the approach guarantee well-formedness of 
cyclic structures (no unbound variables or other types of junk)? 

2. Is the model expressive enough? Can it deal with general graph 
edges, including the back edges and cross edges which arise in 
non-linear graph structures? 

3. Can the model deal with operations that require special treat- 
ment of fixpoint computations? For example, is it possible to 
exploit monotonicity to ensure termination on all inputs for an 
operation that checks the nullability [7, 28] of a grammar or 
regular expression? 

As far as we know no approach deals with #3. Furthermore all 
approaches provide only partial answers to #1 and #2 (a detailed 
discussion is given in Section 7), and fall short in providing a 
practical programming model for cyclic structures. 

This paper presents a new functional programming model for 
graph structures, called structured graphs, that builds on the idea of 



using recursive binders to model sharing and cycles and provides 
an answer for all 3 questions. Structured graphs can be viewed as 
an extension of algebraic datatypes that allow explicit definition 
and manipulation of cycles or sharing by using recursive binders 
and variables to explicitly represent possible sharing points. To pro- 
vide a convenient and expressive programming interface, structured 
graphs use a binding representation based on parametric higher- 
order abstract syntax (PHOAS) [9]. This representation not only 
ensures well-formedness of the binding structure, but it also allows 
using standard proofs methods, including structural induction, in 
proofs for a large class of programs. To deal with cross edges we 
use a recursive multi-binder inspired by letrec expressions in func- 
tional programming. Furthermore, the expressiveness and flexibil- 
ity of our PHOAS-based representation allows us to define opera- 
tions that require special treatment of fixpoint computations. 

Since the binding structure is generic, it is possible to de- 
fine many useful generic combinators for manipulating structured 
graphs. By employing some lightweight datatype-generic program- 
ming [17] techniques, we also propose a datatype-generic formula- 
tion of structured graphs. This formulation enables the definitions 
of useful combinators like generic folds and transformations. Us- 
ing such combinators it is often possible to write programs for 
processing graphs that are no more difficult to write than programs 
on conventional algebraic datatypes. 

The programming model also allows for transformations requir- 
ing complex manipulation of the binding structure, which are less 
easily captured in combinators. Those operations can always be de- 
fined by direct pattern matching on the binding structure (variables 
and binders). 

To summarize, our contributions are: 

• Structured graphs: a new programming model extending the 
classic notion of algebraic datatypes with cycles and sharing. 
This model supports the same benefits as algebraic datatypes 
and facilitates reasoning over cyclic structures. The binding 
infrastructure is conveniently defined and manipulated using a 
PHOAS-based representation. 

• Generic combinators and infrastructure: Additional conve- 
nience is provided through the use of generic combinators for 
folds and transformations. Such generic combinators can also 
encapsulate the use of special fixpoints for certain operations. 

• Recursive binders using PHOAS: We also show how to de- 
fine recursive binders with PHOAS. The recursive multi-binder 
presented in Section 3.2 is particularly relevant, since it enables 
the definition of cross edges. 

The presentation of our work uses Haskell. Occasionally we use 
some common extensions implemented in the GHC compiler. The 
code for this paper is available online at http : //ropas . snu . ac . 
kr/~bruno/papers/StructuredGraphs . zip. 

2. Parametric HOAS 

This section reviews the key advantages of Parametric Higher- 
Order Abstract Syntax (PHOAS) [9] for representing binders. 
There are several approaches to binding, but PHOAS has a unique 
combination of advantages: 1) guaranteed well-scopedness; 2) no 
explicit manipulation of environments; 3) easy to define operations. 
The first two advantages are due to the fact that PHOAS is a higher- 
order approach in which the function space of the meta-language 
is used to encode the binders of the object language. The reuse 
of the meta-language function space avoids common issues with 
first-order approaches like a-equivalence and defining the infras- 
tructure for capture-avoiding substitution. Other higher-order ap- 
proaches like classic HOAS [34] share these advantages. However, 



data PLambda a = 
Var a 
| Int Int 
| Bool Bool 

| // [PLambda a) (PLambda a) (PLambda a) 

| Add (PLambda a) (PLambda a) 

| Mult (PLambda a) (PLambda a) 

| Eq (PLambda a) (PLambda a) 

| Lam (a — > PLambda a) 

| App (PLambda a) (PLambda a) 

newtype Lambda = 4- { t^V a. PLambda a} 



Figure 1. PHOAS-encoded lambda calculus with integers, 
booleans and some primitives. 

with classic HOAS (and other higher-order approaches) many op- 
erations are non-trivial to define in languages like Haskell, whereas 
with PHOAS various operations are generally easier to define. This 
unique combination of features makes PHOAS a particularly at- 
tractive foundation for our work. 

To illustrate the advantages of PHOAS in more detail, we use 
the lambda calculus (with standard extensions) presented in Fig- 
ure 1. Lambda terms are encoded by the newtype Lambda, which 
is defined in terms of the datatype PLambda a. 

Well-scopedness The type argument a in PLambda a is sup- 
posed to be abstract: it should not be instantiated to a concrete type 
when constructing lambda terms. To enforce this, a universal quan- 
tifier (Va. PLambda a) is used in the definition of Lambda. Note 
that, in Haskell, the following type synonym: 

type Lambda = Va. PLambda a 

can be problematic to encode Lambda. This is because in Haskell 
all universal quantifications are pushed to the left-most position af- 
ter expansion of the type synonym. This sometimes makes types 
less polymorphic than expected. The use of a newtype circum- 
vents this problem, at the cost of introducing explicit embedding 
and projection functions 4(hide) and ttreveal). 

Using a as an abstract type ensures that only variables bound 
by a constructor Lam can be used in the constructor Var. For 
example, the identity function can be defined as: 

idLambda — 4- (Lam (Xx —¥ Var x)) 

However the following terms are not valid, and are rejected by the 
type system: 

invalid} = 4- ( Var 1) 

invalid^ y = i (Lam (Xx — > Var «/)) 

The first example tries to use an integer where a value of the 
abstract type a is expected. The second example tries to use a 
variable that is not (directly or indirectly) bound by a Lam and 
as such has a type different from the abstract type a. 

Using parametricity [36, 37] it is possible to prove that PHOAS- 
encoded terms are well-scoped and do not allow bad values to be 
used in variable positions [3]. 

No explicit manipulation of environments Functions defined 
over PHOAS-based representations avoid the need for explicit ma- 
nipulation of environments carrying the binding information. In- 
stead, environments are implicitly handled by the meta-language. 
The evaluator for our lambda calculus presented in Figure 2 illus- 
trates this. The type of the evaluator is simply Lambda — > Value: 
there is no need for an explicitly passed environment. The defini- 
tion of the evaluator is mostly straightforward, although it is worth 



data Value = VI Int \ VB Bool \ VF ( Value -> Value) 
eval :: Lambda — > Va/ue 
ewxZ e = [ t e] where 

[•] :: PLambda Value — > Value 
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Figure 2. An evaluator for the PHOAS-encoded lambda calculus. 



data PLambda a = Mui (a — > PLambda a) \ . . . 

eval :: Lambda — > Value 
eval e = [ f e] where 

[•] :: PLambda Value — > Value 

\M Ul fj = fix (l-j of) 

fix :: (a — > a) — > a 

fix f = let r = f r in r - / (fix f) 



Figure 3. Extending the PHOAS-encoded lambda calculus with 
/i-binders. 

data PLambda a = Mug ([a] — > [PLambda a]) \ . . . 

eval :: Lambda — ¥ Value 
eval e = [f e] where 

[•] :: PLambda Value — > Value 

\Mu 2 fj = head $ fix (map [•] of) 



noting that the interpreter is a partial function that can raise run- 
time errors from failed pattern matching. The crucial step in the 
interpreter is to reveal (t) the lambda term e and instantiate the ab- 
stract type with a suitable type for defining evaluation. In the case 
of evaluation the obvious choice for instantiation is Value. 

Evaluation of the lambda expression (Xx — > 3 + x) 4 proceeds 
as follows: 

ti = I (App (Lam (Xx -> Add (Int 3) ( Var x))) (Int 4)) 
test — show (eval ti ) — returns "7" 

The PHOAS evaluator is simpler than evaluators based on first- 
order binding (for example variables as strings or de Bruijn in- 
dexes), which require an explicit environment to be passed around 
in the evaluator. While monads [38] can encapsulate the plumbing 
of the environment, they also give the interpreter an imperative feel 
and force sequentialiality on computations that could be parallel. 
In contrast the PHOAS interpreter is written in a purely functional 
style that supports simple equational reasoning. 

Easy to define operations Many operations are easier to define 
with PHOAS than with classic HOAS. As noted by Fegaras and 
Sheard [14], to evaluate a version of the lambda calculus encoded 
with classic HOAS, an extra function (reify) is needed to invert the 
result of evaluation: 

data Exp = L (Exp — > Exp) \ A Exp Exp 
evalExp :: Exp — > Value 

evalExp (L f) = VF (evalExp o / o reify) 
evalExp (A ei eg) = case evalExp ei of 
VF f —¥ f (evalExp eg) 

reify :: Value — > Exp 

reify ( VF f) = L (reify o / o evalExp) 

More generally, classic HOAS requires inverse functions [27], 
but good inverse functions do not always exist or can be signifi- 
cantly hard to define. For example to define a pretty print function 
with HOAS requires an inverse parsing function with the property 

print (parse x) = x. 



Figure 4. Extending the interpreter with a recursive multi-binder. 

3. Recursive Binders using Parametric HOAS 

The traditional definition of PHOAS can be extended to model re- 
cursive binding of variables, to support single or mutual recursion. 

3.1 Encoding /i-binders with Parametric HOAS 

One way to support recursive functions in our lambda calculus 
interpreter is to extend it with a recursive binder fi. With such a 
fi binder the factorial function can be defined as follows. 

/i/.An — ¥ if (n = 0) then 1 else n * f (n — 1) 

Figure 3 shows the extension to the calculus and interpreter in 
Figures 1 and 2 that is needed to encode /i binders. This extension 
is not very different from a conventional A binder: it introduces a 
new constructor Mui with the same type has Lam. However the 
semantics of Mui f is different from a regular lambda binder: 
it takes the fixpoint of the composition of / with the interpreter. 
The fixpoint is easily encoded in Haskell by the function fix. The 
definition of fix exploits the call-by-need semantics of Haskell to 
create sharing. Another way to define fix is as / (fix f), but this 
does not share results. 

With this extension the encoding of the factorial function is: 

fact — Mui (Xf — > Lam (An — > 
If (Eq (Var n) (Int 0)) 
(Int 1) 

(Mult (Var n) 

(App (Var f) (Add (Var n) (Int (-1))))))) 
testt = 4 (App fact (Int 7)) 
The result of running eval testi is 5040 (the factorial of 7). 

3.2 Encoding a recursive multi-binder 

A fi binder is sufficient for expressing simple recursion, but mutual 
recursion requires some additional infrastructure. Mutually recur- 
sive definitions bind several variables at once (one for each mutu- 
ally recursive definition). One way to achieve this is illustrated in 
Figure 4. The idea is to generalize the recursive binder type in such 



a way that it takes as the input a list of variables and returns a list of 
lambda expressions. This form of multi-binder is sufficient to ex- 
press the semantics of letrec. The head of the list computed by the 
fixpoint is the body of letrec. 

Consider the following example of mutual recursive binding: 

let odd = An — > if (n = 0) then False else even (n — 1) 

even = An — > if (n = 0) then True else odd (n — 1) 
in odd 10 

It is encoded with a PHOAS multi-binder as follows: 

evenodd :: Lambda 

evenodd =\, (Mu% (A(~(_ : odd : even: _)) — > 
[App ( Var odd) (Int 10), — body of letrec 
Lam (An — > — definition of odd 

If (Eq (Var n) {Int 0)) 
(Bool False) 

(App (Var even) (Add (Var n) (Int (-1))))), 
Lam (An — > — definition of even 

If (Eq (Var n) (Int 0)) 
(Bool True) 

(App (Var odd) (Add (Var n) (Int (-1))))) 

])) 

The recursive multi-binder specifies the set of mutually recursive 
definitions (in this case even and odd) and also the body of letrec 
(in this case odd 10). Therefore, there are three lambda expressions 
in the output list. The first element in the list is the expression 
representing the body of letrec. The other two definitions are the 
expressions representing the definitions of even and odd. The input 
list defines names that allow recursive references to each of the 
definitions. Finally, a laziness annotation (~) is necessary to ensure 
that the pattern matching of the input list is not stricter than it 
should be. Without that annotation, evaluation of the expression 
diverges. 

Implicit assumptions Note that there are some implicit assump- 
tions not captured by Haskell's type system. In particular, the list 
computed by the fixpoint must include at least one element. If the 
list has no elements then taking its head fails. The fixpoint func- 
tion fix was assigned type (a — > a) — > a, but it can be generalized 
to (b — > a) — > a where a is any subtype of b. Haskell does not 
have subtyping, but it is meaningful to consider subtyping relations 
between list types. For example, if a type [T] n is defined to mean 
lists of values of type T with at least n items, then [T] n can be 
viewed as a subtype of [T] m when m ^ n. Using this notation, 
the type of the Mu 2 constructor could be defined to allow any input 
list that is at least as long as the output list (which must still have at 
least one element). 

[a]m — > [PLambda a] n where m ^ n A n > 0 

This more liberal assumption allows infinite lists as input. This 
often makes the algorithms simpler because it is not necessary to 
generate an input list of the exact size of the output list. 

Note that these implicit assumptions can be enforced with a type 
system. For example, a dependently typed language or the Haskell 
extension for GADTs [33] can define types for fixed-size vectors. 
Here we prefer to keep the code simple and more accessible to the 
reader. However the reader interested in extensions of structured 
graphs that statically ensure such size constraints can look at recent 
work by Oliveira and Loh [30]. 

4. Structured Graphs 

Structured graphs use the recursive binders introduced in Section 3 
to describe cyclic structures. We consider two types of cyclic struc- 
tures: cyclic streams and cyclic binary trees. These two types of 



structures are useful to illustrate two different types of edges that 
arise with structured graphs: back edges and cross edges. What is 
interesting about cyclic streams is that they only allow back edges, 
whereas most other types of structures (like cyclic binary trees) also 
allow cross edges. Back edges are modelled with simple fi binders, 
while cross edges require the recursive multi-binder introduced in 
Section 3.2. 

4.1 Cyclic Streams and Back Edges 

A datatype for cyclic streams can be defined as follows: 

data PStream a v = 
Var v 

| Mu (v — > PStream a v) 
| Cons a (PStream a v) 

newtype Stream a — I { "f::V 'v. PStream a v} 

This datatype of streams has the usual Cons constructor and also 
PHOAS binding constructs: the variable case and the simple recur- 
sive binder. There are two possible interpretations for this datatype: 
an inductive and a coinductive one 1 . In the inductive interpreta- 
tion, which is the one we use for most operations on streams, this 
datatype represents finitely representable cyclic streams such as: 

si =1 (Cons 1 (Mu (aI -> Cons 2 (Var I)))) 

s 2 =i (Mu (\i Cons 1 (Cons 2 (Var I)))) 

Acyclic (and infinite) streams such as the stream of natural num- 
bers are not representable under this interpretation. On the other 
hand the inductive interpretation allows us to define several use- 
ful operations like decidable equality procedures on cyclic streams. 
The coinductive interpretation admits acyclic infinite streams, but 
some operations are no longer be valid. 

Note that the only types of cycles needed in structures like 
streams are back edges: edges that point to some previous point 
in the structures. This stems from the fact that streams are linear 
structures and "pointing back" is the only option. 

A final remark is that the type Stream a allows values like 
4- (Mu Var), which do not represent any stream. Section 5 gives a 
representation that prevents such junk terms. 

Folds on Streams The traditional notion of a fold on a list can 
be extended to cyclic streams. A cyclic fold visits each node only 
once. Classical imperative graph algorithms normally keep a list of 
visited nodes to avoid visiting a node twice. With our representation 
of streams such bookkeeping is not necessary. For example, the 
function elems visits all the elements in a stream exactly once and 
returns a list of visited elements. 

elems :: Stream a — > [a] 

elems — pelems o -f where — a fold 

pelems :: PStream a [a] — > [a] 

pelems ( Var v) — v 

pelems (Mu g) — pelems (g []) 

pelems (Cons x xs) — x : pelems xs 

As in the evaluation function for the PHOAS-based interpreter, an 
auxiliary operation pelems is defined over the PStream type. The 
abstract type used for variables is instantiated to [a]. The Cons 
case is trivial and the variable case is easy too: it simply returns the 
list value v. Evaluation of a back edge must visit the elements only 



1 In Haskell it is not possible to convey to the compiler which interpretation 
to use. However other languages, including Coq, allow for such choice. 



one time. To get access to the elements of the list, the generator 
function g must be applied somehow. Taking the fixpoint of g is 
wrong because it could generate an infinite list. Instead, the empty 
list is passed to g, so that when a variable (a back edge) is reached 
it returns that empty list. 

More generally the recursion pattern of such fold-like opera- 
tions can be captured by the following combinator: 

foldStream :: (a — > b — > b) — > b — > Stream a — > b 

foldStream f k = pfoldStream o -f where 
pfoldStream ( Var x) = x 
pfoldStream (Mu g) = pfoldStream (g k) 
pfoldStream (Cons x xs) = / x (pfoldStream xs) 

which allows writing elems more compactly as: 

elems' = foldStream (:) [] 

Cyclic Folds on Streams Another class of operations definable 
on cyclic streams are cyclic folds. Cyclic folds allow us to define 
operations that use cyclic streams as if they were represented as an 
infinite stream. A cyclic fold combinator can be defined as follows: 

cfoldStream :: (a — > b — > b) — > Stream a — > b 

cfoldStream f — pcfoldStream o j- where 
pcfoldStream ( Var x) — x 
pcfoldStream (Mu g) — fix (pcfoldStream o g) 
pcfoldStream (Cons x xs) — f x (pcfoldStream xs) 

The difference to a regular fold on streams is that there is no base 
case. Instead, in the case for the binder the fixpoint of the function 
pcfoldStream o g is provided as an argument to g and used in the 
variables. Examples of cyclic folds include a toList operation that 
computes an infinite list from a cyclic stream, or a pretty printing 
operation (upp) that computes an infinite string representation. 

toList — cfoldStream (:) 

upp — cfoldStream (\x s — > show x -ff " : " -ff s) 

Sharing-preserving Transformations An example of a sharing- 
preserving operation is the map function (smap) on cyclic streams: 

smap :: (a — > b) — > Stream a — > Stream b 

smap f s = 4- (psmap f ( ~[ s)) where 
psmap f ( Var v) = Var v 
psmap f (Mu g) = Mu (psmap fog) 
psmap f (Cons x xs) = Cons (f x) (psmap f xs) 

In the definition of psmap, variables are mapped to variables and 
the binders are mapped to binders. In other words the structure 
of the original stream is preserved. Only the elements change. 
Another difference to functions defined previously is that, because 
a new stream is produced as the final result, at the end the resulting 
P Stream b v is packed into a Stream b. 

Structural equality Cyclic (inductive) streams always have a fi- 
nite representation, so they can be compared for structural equality 
without danger of nontermination: 

instance Eq a Eq (Stream a) where 

St = s 2 = peq 0 (t s t ) (t s«) 
peq :: Eq a 

Int — ¥ PStream a Int — > PStream a Int — > Bool 
peq n ( Var x) ( Var y) = x = y 
peq n (Mu f) (Mu g) = peq (n + 1) (/ n) (g n) 
peq n (Cons x xs) (Cons y ys) = x = y A peq n xs ys 
peq _ = False 

The idea for defining structural equality is to replace each variable 
with a fresh label, which in this case is an integer. Then the variable 



stall :: Stream a — ¥ Stream a 
stail s = 4- (joinPStream (ptail (t s))) where 
ptail (Cons x xs) = xs 
ptail (Mu g) = Mu (\x 

let phead (Mu g) = phead (g x) 

phead (Cons y ys) = y 
in ptail (g (Cons (phead (g x)) x))) 



Figure 5. Tail of a stream. 

case just compares whether the two labels are the same. The most 
interesting case is the Mu case. The idea is to pass the same label 
(n) to the generator functions / and g and to generate a new fresh 
label for the next time a new label is needed. Finally, the last two 
cases are standard. 

A Quasi-monad Structure The PStream a type constructor has 
a structure similar to a monad: it supports a return and join (or 
concai) operations. However, PStream a is not a functor (that is, 
it does not support a functorial mapping operation), failing to be a 
monad for this reason. The return of this quasi-monad is Var 

retP Stream :: v — > PStream a v 
retP Stream = Var 

and the join operation has a fairly straightforward definition: 

joinPStream :: PStream a (PStream a v) — ¥ PStream a v 
joinPStream ( Var v) = v 

joinPStream (Mu g) = Mu (joinPStream 050 Var) 
joinPStream (Cons x xs) = Cons x (joinPStream xs) 

The joinPStream stream operation is useful for defining vari- 
ous operations on streams. For example, consider an operation 
unrollStream that unrolls a cycle once, as in these examples: 

* Streams > unrollStream sn 
l:2:Mu(\a^l:2:a) 

* Streams > unrollStream (unrollStream sg) 
1 : 2 : 1 : 2 : Mu (\a ->• 1 : 2 : a) 

This operation can be defined using joinPStream as follows: 

unrollStream :: Stream a — > Stream a 
unrollStream s — 4- (joinPStream (punroll (t s))) 

punroll :: PStream a (PStream a v) — > 

PStream a (PStream a v) 
punroll (Mu g) = g (joinPStream (Mu g)) 
punroll (Cons x xs) = Cons x (punroll xs) 

Note that punroll does not define a variable case ( Var). This is 
because punroll is only called at the top-level structure and a 
variable cannot appear there because there is no variable that can 
be bound. In other words it is hard to fill the ... in an expression 
like 4- ( V ar ■ ■ ■)■ F° r the recursive binder case, when Mu g is 
found, the generator function g is called to generate one level of the 
structure. The argument to g is the stream itself, leading to a nested 
stream which is collapsed using joinPStream. As observed by 
Chlipala [9] operations like joinPStream can be used to effectively 
implement substitution of variables in binders. 

Tail of a Stream So far all the operations that have been presented 
have remarkably simple and high-level definitions in comparison 
with imperative algorithms on cyclic structures. However, certain 
operations are not as simple to define. For example Ghani et al. [15] 
consider defining the tail of a cyclic stream. They observe that a 
possible implementation of this function should rotate the stream 



when the head is part of a cycle. For example, taking the tail of s a 
should result in: 

* Streams > stail s% 
Mu (Xa -»■ 2 : 1 : a) 

The implementation of a tail of streams is presented in Figure 5. 
The Cons case is trivial, but in the Mu case the elements in 
the cycle must be rotated. The basic idea is to substitute [x t-t 
Cons (phead (g x)) x] in g. This has the effect of putting the 
head of the original stream (g x) in the last element before the 
variable. The new stream is formed by skipping the first element 
and using joinPStream in a final step to perform the substitution. 

4.2 Cyclic Binary Trees and Cross Edges 

The datatype for cyclic binary trees is as follows: 

data PTree a v = 
Var v 

| Mu ([v] ->■ [PTree a v]) 
| Empty 

| Fork a (PTree a v) (PTree a v) 

newtype Tree a — J. { "j~::Vu. PTree a v} 

The main difference to the datatype of streams (besides the tree- 
specific constructors Empty and Fork) is the need for the recursive 
multi-binder introduced in Section 3.2. With a simple recursive 
binder, it is only possible to model back edges such as: 

U=\. (Mu (A(~(x: _)) -> 

[Fork 1 (Fork 2 (Var x) Empty) (Var x)})) 

In this case the reference x points back at the root. 

Expressive Cross Edges Recursive multi-binders offer several 
expressiveness benefits over simple recursive binders. Namely it 
becomes possible to express: 1) cross edges between nodes in 
neighbouring trees and 2) cross edges in both directions (mutual 
recursion). As an example illustrating this expressiveness consider 
the following tree: 

t 2 = I (Mu (A(~(z : y: _)) -»• 

[Fork 1 ( Var y) ( Var x), Fork 2 ( Var x) ( Var y)})) 

This tree has two cyclic references x and y for the subtrees 
Fork 1 (Var y) (Var x) and Fork 2 (Var x) (Var y). In 
the first subtree the reference x is a back edge because it points 
back at itself, whereas the reference y is a cross edge because it 
points at the neighbouring subtree. A similar thing happens in the 
second subtree, only this time in reverse: the reference x is a cross 
edge and the reference y is a back edge. Note that this example 
requires mutual recursion, because the two subtrees are defined in 
terms of each other. 

Operations on Cyclic Binary Trees Nearly all the operations 
defined for cyclic streams have a corresponding definition on cyclic 
binary trees 2 . Figure 6 shows those definitions. Most operations are 
defined in a similar way to the equivalent operations on streams. 

The main difference to the definitions on streams lies in the 
treatment of Mu binders. Because trees use recursive multi- 
binders, operations need to be generalized to account for a list of 
inputs and a list of outputs. To ensure that the input list has at least 
as many elements as the output list, we often produce an infinite 
list. For example, in foldTree the input to the generator function g 
is the infinite list repeat ki , and in peq the generator functions are 
provided with the list iterate succ n. 

The operation that needs a little more extra work, in comparison 
to the equivalent definition on streams, is structural equality. To 



Fold: 

foldTree :: (a — >•&—>■&— ¥ b) — ¥ b — >■&—>■ Tree a —¥ b 
foldTree f ki kn s = trans s) where 
trans ( Var x) — x 

trans (Mu g) — head (map trans (g (repeat ki))) 

trans Empty = kn 

trans (Fork x I r) = f x (trans I) (trans r) 

Cyclic fold: 

cfoldTree :: (a ->■ b -> b -> b) ->■ b ->■ Tree a -> b 
cfoldTree f k s — trans ( t s) where 
trans ( Var x) — x 

trans (Mu g) — head (fix (map trans o g)) 
trans Empty — k 

trans (Fork x I r) = f x (trans I) (trans r) 
Mapping: 

tmap :: (a — ¥ b) — ¥ Tree a — > Tree b 
tmap f s — 4- (pmap f ( f s)) where 

pmap f ( Var x) — Var x 

pmap f (Mu (?) = Mu (map (pmap f) o g) 

pmap f Empty — Empty 

pmap f (Fork x I r) — Fork (f x) (pmap f I) (pmap f r) 

Structural Equality: 

instance Eq a Eq (Tree a) where 
U = t 2 =peqO(tt 1 ) (tt 2 ) 

peq :: Eq a => hit — > PTree a Int — ¥ PTree a Int — ¥ Bool 
peq _ ( Var x) ( Var y) — x = y 

peq n (Mu f) (Mu g) = 

let U — f (iterate succ n) 
h — g (iterate succ n) 

in and $ zipWith (peq (n + length h)) li 1% 
peq n Empty Empty = True 

peq n (Fork xj U n) (Fork x>> lz r 2 ) = 

xt = xa A peq n h lg A peq n r t r a 
peq _ = False 

Quasi-monadic join on PTree a: 

pjoin :: PTree a (PTree a v) — ¥ PTree a v 
pjoin ( Var v) = v 

pjoin (Mu g) = Mu (map pjoin o g o map Var) 

pjoin Empty — Empty 

pjoin (Fork x I r) = Fork x (pjoin I) (pjoin r) 

Unrolling: 

unrollTree :: Tree a — ¥ Tree a 
unrollTree s = 4- (pjoin (unroll ( j" s))) 

unroll :: PTree a (PTree a v) — ¥ PTree a (PTree a v) 
unroll (Mu g) — head (g (repeat (pjoin (Mu <?)))) 
unroll Empty = Empty 
unroll (Fork x I r) — Fork x (unroll I) (unroll r) 



Figure 6. Operations on cyclic trees. 



2 An exception is the stream tail operation, which is specific to streams. 



Mapping laws: 

smap id = id 

smap f o smap g = smap (fog) 

tmap id = id 

tmap f o tmap g = tmap (fog) 

Fold fusion (cyclic streams): 

Assume / strict, / a = b and f (g x y) = h x (f y) for all x y, 
then: 

/ o foldStream g a = foldStream h b 



Figure 7. Some laws about operations on structured graphs. 

account for the fact that recursive multi-binders Mu may bind 
several variables at once, the next fresh variable must be updated 
accordingly. Since an integer is used to produce fresh variables, 
and it is known how many new variable labels have been generated 
(length h), the next label is n + length h . The elements in the 
two output lists h and I2 are compared by zipping the two lists with 
peq (n + length h ) and then checking that all comparisons have 
returned True. 

A final remark concerns the interpretation of the Mu binders. 
Here, the head of the output list has a special role by being inter- 
preted as the root of the tree. All other trees are auxiliary definitions 
to model the structure of the root tree. This interpretation is similar 
to letrec in our interpreter in Section 3.2. There, the head (which 
represented the body of letrec) was also treated specially. However, 
another alternative interpretation is to treat all trees equality, with- 
out preference for one of them. This interpretation uses a. forest of 
trees, or a multi-rooted tree. Sometimes the later interpretation is 
useful for working with cyclic structures. An example of this is the 
model for grammars in Section 6. 

4.3 Reasoning about Structured Graphs 

One important benefit of structured graphs is that standard func- 
tional programming reasoning techniques can be used to reason 
about programs. In particular properties about several of operations 
defined in this section are provable by structural induction. 

Figure 7 illustrates adaptations of typical laws for maps and 
folds to their corresponding structured graph operations. All these 
laws are proved by structural induction on the P Stream and PTree 
datatypes. To do so, the definitions, including smap, tmap and 
foldStream, must be unfolded to reveal the underlying definitions 
that operate on PStream or PTree. The structural induction itself 
is standard, with the exception of the Mu case. 

We illustrate the proof technique in more detail on the map 
fusion law for Tree: 

tmap f o tmap g = tmap (f o g) 

This equation is proved in terms of the following property on 
pmap. 

pmap f o pmap g = pmap (fog) 
We rewrite this equation to pointwise form: 

pmap f (pmap g x) = pmap (/op) 1 

Now structural induction on x applies. There are 4 cases. The Var 
and Empty cases are trivial. The Fork case is standard and not 
interesting. The only interesting case is the Mu case: 

pmap f (pmap g (Mu h)) 
= {-Definition of pmap -} 



pmap f (Mu (map (pmap g) o h)) 

= {-Definition of pmap -} 
Mu (map (pmap f) o map (pmap g) o h) 

= {-map-fusion (on lists) -} 
Mu (map (pmap f o pmap g) o h) 

= {-Induction hypothesis -} 
Mu (map (pmap (f o g)) o h) 

= {-Definition of pmap -} 
pmap (f o g) (Mu h) 

Some proofs also need parametricity arguments. This is the case 
for the fold fusion law for foldStream. The proof requires a para- 
metricity argument stating that the values appearing in the variable 
case must be the same as the values passed to the generator func- 
tion. However, it is possible to avoid this parametricity argument 
with an alternative definition of foldStream: 

foldStream :: (a — > b — > b) — > b — > Stream a — > b 

foldStream f k = pfoldStreamo j- where 
pfoldStream ( Var x) = k 
pfoldStream (Mu g) = pfoldStream (g ()) 
pfoldStream (Cons x xs) = / x (pfoldStream xs) 

By instantiating the abstract type for variables to the unit type and 
using k directly in the variable case, we avoid the parametricity 
argument. Then the proof is done with a simple structural induction 
proof similar to the one used in the proof for tmap fusion. 

Finally, note that not all operations are inductive (for example 
cyclic folds). Nevertheless we expect that other techniques such as 
coinduction or fixpoint-based reasoning techniques can be used to 
reason about such definitions. 

5. Generic Structured Graphs 

The similarity between the operations on cyclic streams and trees 
leads naturally to the question of whether there is a more generic 
way to define structured graphs. This section shows that by using 
some lightweight datatype-generic programming [17] techniques it 
is possible to define highly reusable combinators for manipulating 
structured graphs of different types. These combinators provide us 
with a framework which end-users can use to define their own 
domain-specific programs using structured graphs. Because the 
combinators hide most of the complexity of the PHOAS-based 
representation they help lowering the entry cost for users. Often, 
using combinators, it is possible to write programs on structured 
graphs that are no more complex than programs on conventional 
algebraic datatypes. 

5.1 A Generic Representation for Structured Graphs 

A generic datatype for structured graphs can be defined as follows: 

data Rec f a — 

Var a 

\Mu([a]^[f (Reef a)]) 
I In (f (Reef a)) 

newtype Graph f = J. { ^-.-Ma.Rec fa} 

This representation separates the datatype-specific parts of struc- 
tured graphs from the generic binding infrastructure (the con- 
structors Var and Mu). The idea is to parametrize the datatype- 
specific parts with a type-constructor /, which is a similar to the 
functors used in various simple datatype-generic programming ap- 
proaches [17, 23]. 

Streams Revisited To recover streams, the type-constructor / is 
instantiated as follows: 



class Functor f where 

fmap :: (a — > b) — > f a — > f b 

class (Functor f, Foldable /) => Traversable f where 
traverse :: Applicative i (a — > i b) — > f a — > i (f b) 



Figure 8. The Functor and Traversable type classes. 

data StreamF a r = Cons a r 

deriving (Functor, Foldable, Traversable) 

type Stream a = Graph (StreamF a) 

Values of this type are defined almost as those in Section 4.1. 
For example the cyclic stream onetwo (1 : 2 : . . .) is defined as 
follows: 

onetwo = I (Mu (A(~(s: _)) — > 
[Cons 1 (In (Cons 2 ( Var s)))])) 

The only difference is that additional In constructors are needed at 
each recursive step. 

Trees Revisited To recover cyclic trees the functor / is instanti- 
ated as follows: 

data TreeF a r = Empty | Fork a r r 

deriving (Functor, Foldable, Traversable) 

type Tree a = Graph (TreeF a) 

An example of a cyclic tree is: 

tree = I (Mu (A(~(«j : t 2 : t s : -)) ->■ [ 

Fork 1 ((In (Fork 4 (Var t 2 ) (In Empty)))) (Var t s ), 
Fork 2 (Var ti) (Var t s ), 
Fork 3 (Var t 2 ) (Var t/)])) 

Requirements on Functors Note that the definitions of StreamF 
and TreeF derive some classes. In general Functor, Foldable 
and Traversable instances are required for the functors / used 
in Graph f. These classes provide useful methods to define our 
generic combinators. For reference Figure 8 shows (simplified ver- 
sions) of the Functor and Traversable classes. Here we only 
define the methods fmap and traverse, which are needed in Sec- 
tion 6. The function fmap is a generalization of the map func- 
tion for containers, and traverse is an effectful variation of fmap 
for applicative effects [26]. The definitions of the Foldable and 
Applicative classes are omitted. Recent versions of the GHC 
compiler can derive the instances of Functor, Foldable and 
Traversable mechanically using an extension of the derivable 
type-classes mechanism. More information about those classes can 
be found in work by McBride and Paterson [26] or Gibbons and 
Oliveira [18]. 

Forbidding empty cycles An additional advantage of this repre- 
sentation is that it prevents empty cycles. With the datatypes used 
for streams and trees in Section 4, empty cycles such as: 

empty — \, (Mu Var) 

were allowed. One problem with such empty cycles is that there 
is no (infinite) stream or tree that corresponds to that value. Such 
junk values are not desirable and should be forbidden. Fortunately, 
our generic representation offers a good solution for this problem. 
The idea is to interleave the functor / with recursive occurrences 
of Rec f a. This is used in Mu, which requires a function of type 
[v] — > [f (Rec f v)} as an argument. Meaningless expressions, 
e.g. 

empty — I (Mu (X(^(x: _)) — > [ Var x])) — type-error 



gfold :: Functor f (t ->■ c) -> (([t] ->■ [c]) ->■ c) -> 

(/ c — > c) — > Graph f —¥ c 
gfold v I f — trans o j- where 
trans ( Var x) — v x 

trans (Mu g) — I (map (f o fmap trans) o g) 
trans (In fa) — f (fmap trans fa) 

fold :: Functor / => (/ c — > c) — > c — )• Graph f — > c 
fold alg k = gfold id (\g — > head (g (repeat k))) alg 

cfold :: Functor f (/ t ->• i) -J- Graph f ->■ t 
cfold — gfold id (head o fix) 

sfold :: (Eq t, Functor f)=>(ft^t)^-t^ Graph f ->■ t 
sfold alg k = gfold id (head o fixVal (repeat k)) alg 

fixVal :: Eq a => a — > (a — > a) — ^ a 
fixVal v f = if v = v' then v else fixVal v' f 
where v' = f v 



Figure 9. Generic graph folds. 

are not well typed because Var is a constructor of Rec f v and 
not of / (Rec f v). In other words, a constructor of the specific 
structure (given by the functor /) must always be used first. 

5.2 Generic Operations 

The operations defined on streams or trees can be made generic. 

Generalizing Folds Figure 9 shows a little library of fold-like 
combinators. All folds described on the figure are an instance of 
gfold. The function gfold generalizes several fold-like functions 
presented in Section 4 in 2 dimensions: 

• Graph-generic: Rather than depending on a particular graph 
structure like streams or trees, gfold is parametrized by a 
functor /, which abstracts over the particular graph structure. 
This type of generalization is a form of datatype-generic pro- 
gramming, which we call 'graph-generic' instead of 'datatype- 
generic' to emphasize the use of structured graphs rather than 
plain algebraic datatypes. 

• Fixpoint-parametrized: As illustrated in Section 4, there are a 
few variations of folds (for example, regular and cyclic folds). 
The main difference lies on the treatment of the recursive binder 
(Mu). The function gfold generalizes such folds by parametriz- 
ing treatment of the fixpoint using the function 

The functions fold, cfold and sfold are graph-generic variations 
of folds, but with specific treatments of the recursive binder. The 
function fold is the graph-generic version of folds like foldStream 
or foldTree. Correspondingly, the function cfold is the graph- 
generic version of cyclic folds like cfoldStream or cfoldTree. The 
more generic combinators support simpler definitions of the elems 
and toList functions: 

elems :: Stream a — > [a] 
elems = fold streamf2list [ ] 

toList :: Stream a — > [a] 
toList — cfold streamfZlist 

streamf2list :: StreamF a [a] — > [a] 
streamf2list (Cons x xs) — x : xs 

Finally, the sfold function is yet another variant of fold-like oper- 
ations. It uses a special fixpoint operation fixVal, which works for 
monotonic functions and values that support a comparison opera- 
tion (=). This combinator is used in Section 6. 



Generic transformations on graphs: 
type / ~> g = Vo./ a ->• g a 
transform :: (Functor f, Functor g) => 

(/ ~> ff) -»■ Graph / ->■ Gnap/i p 
transform f x — I (hmap ( f a;)) where 
ftmap ( Var s) = Var x 

hmap (Mu <?) = Mm (map (/ o fmap hmap) o g) 
/imap (In x) = In (f (fmap hmap x)) 

Generic mapping on graph containers: 

class BiFunctor f where 

bimap :: (a — > c) — > (b — > d) — > / a b — > f c d 

gmap :: (BiFunctor f, Functor (f a), Functor (f b)) => 

(a — > b) — > Graph (f a) ->■ Grap/i (/ b) 
gmap f — transform (bimap f id) 

Generic quasi-monadic join: 

pjoin :: Functor f => Rec f (Rec f a) — > Rec f a 
pjoin ( Var x) — x 

pjoin (Mu (?) = Mu (map (fmap pjoin) o g o map Var) 
pjoin (In r) — In (fmap pjoin r) 

Generic unrolling: 

unrollGraph :: Functor f => Graph f —¥ Graph f 
unrollGraph g = \. (pjoin (unroll (t <?))) 
unroll :: Functor f =>■ Rec f (Rec f a) —¥ Rec f (Rec f a) 
unroll (Mu g) = In (head (g (repeat (pjoin (Mu <?))))) 
unroll (In r) = In (fmap unroll r) 



Figure 10. Generic graph transformations 



Generalizing Transformations Figure 10 shows a little library 
of transformation combinators. An important operation is the 
transform function. This function transforms a graph with a struc- 
ture / into a graph with a structure g using a natural transforma- 
tion f ~~> g. Note that in categorical terms the auxiliary function 
hmap is a functorial map operation, but in a category with func- 
tors as objects and natural transformations as arrows. The function 
transform can be used, for example, to convert a tree or a stream 
into a graph structure VGraph that can be rendered into a graphical 
representation of the corresponding graph. 

data VGraphF a = VNode String [a] 

deriving (Show , Functor , Foldable, Traversable) 

type VGraph — Graph VGraphF 

btree2vgraph :: Show a => Tree a — > VGraph 

btree2vgraph = transform trans where 
trans Empty = VNode "" [] 
trans (Fork x I r) — VNode (show x) [I, r] 

Another operation that can be defined with transform is a generic 
mapping operation (gmap) on graph containers. The gmap func- 
tion requires container-like type constructors such as StreamF or 
TreeF to be instances of the class BiFunctor. Note that such 
BiFunctor requirements are standard for this kind of container 
structures [23]. 

Finally, a different type of transformation is a generic version 
of the quasi-monadic join operation (pjoin). The function pjoin is 
a straightforward generalization of the corresponding function on 
streams and trees. A generic version of unrolling (unrollGraph) 
can be defined in terms of pjoin. Notably the unrollGraph trans- 



formation alters the graph-sharing shape: in the Mu case a value 
built using a In data constructor is returned. This is in contrast 
with operations of the transform family, which preserve the origi- 
nal graph-sharing structure. 

5.3 Ad-hoc Generic Operations 

While many operations can be captured with generic recursion pat- 
tern combinators like gfold and transform, some operations may 
require less common types of recursion patterns. While it is possi- 
ble to add a large number of general purpose recursion patterns to 
our library, this introduces some additional end-user cost because 
users have to learn when and how to use the recursion patterns 
(which is not trivial). A less general, but more pragmatic approach 
consists of using type-classes to divide the generic processing parts 
of a specific operation from the structure specific parts of that oper- 
ation. We illustrate this technique on two operations: generic struc- 
tural equality and generic pretty printing. 

Equality A generic version of structural equality can be defined 
by the geq function: 

geq :: EqF f => Graph f -¥ Graph f ->• Bool 
geq gi gz = eqRec 0 ( t 9i ) ( 1 9s) 

eqRec :: EqF f Int -¥ Rec f Int ->■ Rec f Int ->■ Bool 
eqRec _ ( Var x) ( Var y) — x = y 
eqRec n (Mu g) (Mu h) = 

let a — g (iterate succ n) 
b — h (iterate succ n) 

in and $ zip With (eqF (eqRec (n + length a))) a b 
eqRec n (In x) (In y) — eqF (eqRec n) x y 
eqRec _ = False 

The function eqRec deals with the generic binding structure, while 
the type-class EqF provides equality for the structure-specific parts 
of the graph: 

class Functor f => EqF f where 

eqF :: (r ->■ r ->■ Bool) ->• / r ->• / r ->• Bool 

The type r is treated as an abstract type and the recursive call to deal 
with values of type r is explicitly provided. This avoids leaking im- 
plementation details of equality (dealing with fresh variables) to the 
code users have to write. Writing instances of equality for graphs 
is no more difficult than writing structural equality on conventional 
algebraic datatypes: 

instance Eq a EqF (StreamF a) where 

eqF eq (Cons x xs) (Cons y ys) — x = y A eq xs ys 

Pretty Printing A generic pretty printing function can be defined 
as follows: 

showGraph :: ShowF f ^ Graph f — > String 
showGraph g = showRec (iterate succ 'a') (\ g) 

showRec :: ShowF f => [ Char] —5- Rec f Char — > String 
showRec _ ( Var c) = [c] 
showRec s (Mu f) = 
let r = / s 

(fr, s') = split At (length r) s 
in "Mu (\n" -#- concat 

[" " -H- [a] -Vr" => " -H- v -H- "\n" | (a, v) «- 

zip fr (map (showF (showRec s')) r)] -ff ")\n" 
showRec s (In fa) = showF (showRec s) fa 

Like structural equality the strategy is to have an additional argu- 
ment (s) which keeps track of a list of fresh variables. The Mu 
case creates the list of results based on the seed, then builds a string 



that maps the fresh variables to the string encoding of the results. 
The class ShowF and the operation showF deal with the structure- 
specific behavior. 

class Functor f => ShowF f where 
showF :: (r — > String) — > / r — > String 

Like in EqF the type r is treated as an abstract type and the 
recursive call for dealing with recursive occurrences is explicitly 
passed. Instances of this class look essentially the same as the 
corresponding operation on a conventional algebraic datatype: 

instance Show a => ShowF (TreeF a) where 
showF sh Empty = "Empty" 
showF sh (Fork x I r) = "Fork " -ff show x -ff 
"(" -W-sh l-W- ") (" -W-sh r-ff ")" 



6. Application: Grammars 

This section shows a concrete application of structured graphs: 
grammar analysis and transformations. We discuss 3 different op- 
erations on grammars: nullability, first set and normalization. One 
interesting aspect of dealing with grammars is that some analyses, 
including nullability and first sets, require a special treatment for 
fixpoints to ensure termination for all grammars. Normalization is 
also interesting because it illustrates an example of a non-trivial 
transformation on graph structures. 

6.1 Grammars 

A grammar is a collection of mutually recursive productions, where 
each production has a name and a pattern, which can be a terminal, 
the empty string, a sequence of two patterns, or an alternative of 
two patterns. The pattern data type is defined as follows. 

data PatternF a — Term String \ E \ Seq a a \ Alt a a 
deriving (Functor, Foldable, Traversable) 

A grammar is then a mutually recursive collection of patterns, 
where patterns can also refer to themselves or other patterns. The 
references between patterns are normally expressed by naming 
each pattern and allowing the names, called non-terminals, to be 
used as a pattern. We represent the same grammar structure as a 
graph, where the nodes are patterns and the edges are references 
between patterns. Binders take the place of explicit names. 

Nullability One classical analysis of a grammar is nullability [7]. 
Nullability determines whether a given nonterminal can produce 
the empty string. The analysis is defined on each specific grammar 
expression node: terms are not mailable, e is mailable, and sequence 
and alternative correspond to and and or respectively. 

nullF :: PatternF Bool — >■ Bool 
nullF ( Term s) = False 
nullF E = True 

nullF (Seq g t g 2 ) = gi A g 2 
nullF (Alt g t g 2 ) = gi V g 2 

To process a complete grammar, the nullF analysis is applied 
to each expression, such that results of analyzing a pattern are 
propagated to each place the pattern is used. This operation is 
provided by the sfold combinator in Section 5.2. Using sfold, 
nullability analysis on grammars is defined by applying the nullF 
transformation with starting value False. 

nullable — sfold nullF False 

Note that using cfold instead of sfold to define nullability: 

badNullable — cfold nullF 



is problematic, because this function does not terminate for some 
inputs. For example a "problematic" grammar for nullability anal- 
ysis is the left-recursive grammar a — > a 'x', represented by: 

g = 4. (Mw ( 

A(~(a: _)) -> [Alt (Var a) (In (Term "x"))])) 

Using nullable nullability analysis terminates, but with badNullable 
it doesn't. The reason for the non-termination of badNullable is 
that it uses the generic fixpoint combinator fix, but nullability anal- 
ysis requires a fixpoint operation that exploits monotonicity [28]. 

First Set One analysis can be reused in defining another analysis. 
This situation arises in defining the first set of a pattern. The first 
set is the set of terminals that can start sentences produced by a 
pattern. 

The first set analysis takes nullability and first sets as input, 
and returns the first set. The only interesting case is for sequences, 
which include the first set of both subpatterns if the left pattern is 
nullable. 

firstF :: PatternF (Bool, [String]) — > [String] 
firstF (Term s) = [s] 

firstF E = [] 

firstF (Seq (6j , ai) (_, a 2 )) = if bi then ai U a 2 else ai 
firstF (Alt (_, a t ) (_, a 2 )) = ai U a 2 

To define a complete analysis, the nullability and first set analysis 
are composed. 

nullFirstF :: PatternF (Bool, [String]) — > (Bool, [String]) 
nullFirstF — compose (leftPart nullF) firstF 

compose f g x = (/ x,g x) 

leftPart :: Functor f (/ a -¥ a) — >■ / (o, b) — ¥ a 
leftPart dig = alg o fmap fst 

Finally, running the first/nullable analysis is similar to running 
nullability. 

firstSet = sfold nullFirstF (False, []) 

Normalization A more complex operation on grammars is a 
simple form of grammar normalization. A grammar is normal- 
ized if each node has a simple structure, where only one sequen- 
tial/alternative composition may appear on the right hand side 
of a rule. For example, the normalized version of the grammar 
a — > 'x' a | 'y ' a is: 

a — > b | c 
b -¥ 'x' a 
c — > 'y' a 

Our approach to solving this problem is to define a general mech- 
anism for creating a new graph by writing nodes one by one. The 
new nodes are managed by a state monad. The state of the monad 
is a triple (n,i,o) where n is the number of nodes that have been 
defined, i is the list of referenceable node identities, and o is the list 
of node definitions. 

type MGraph f a = State (Int, [a], [f (Rec f a)]) 

A helper function addNode creates a new node, increments the 
node count and returns a reference to the new node. 

addNode x = do (pos, inn, out) get 

put (pos + 1, inn, out 4f [x]) 
return $ Var (inn !! pos) 

The actual work of normalization is done by normF, which simply 
copies leaf patterns (terminals and epsilons), but creates new nodes 
for any composite patterns. 



normF :: PatternF (Rec PatternF a) — > 

MGraph PatternF a (Rec PatternF a) 
normF x@(Term s) = return $ In x 
normF x@E — return $ In x 

normF x — addNode x 

The normF function is called by normalize, which traverses the 
actual graph. 

normalize :: Graph PatternF — > Graph PatternF 
normalize x = 4- (evalState (trans (t x)) (0, [], [])) 
trans ( Var x) = pure ( Var x) 

trans (Mu g) — pure $ Mu (XI — > runlt (I, g I) (scan (g I))) 

trans (In s) = traverse trans s ^= normF 

scan o — traverse (traverse trans) o >■= addNodes 

The definitions of the auxiliary functions runlt and addNodes are: 

runlt (I, out) m — evalState m (length out, I, []) 
addNodes new — do 

(_, _, nodes) <— get 

return (new 4f nodes) 

Note that unlike nullability and first set, normalize is defined 
by pattern matching on the binding structure using the auxiliary 
definition trans. This is because the transformation required by 
normalization is fairly complex and it does not fit in with common 
recursion schemes. 

7. Related Work 

Throughout the paper we have already discussed a lot of related 
work. In this section we make a finer comparison with the closest 
related work and also discuss some other related work. 

Representing cyclic structures using binders In comparison to 
previous work, our PHOAS-based representation of binders allows 
a unique combination of features that: 

• Ensures well-scopedeness and prevents the creation of junk 
terms; 

• Allows the definition of cross edges as well as back edges; 

• Makes operations easy to define and without needing to unroll 
cycles; 

• Has fairly modest requirements from the type system; 

• Can be used in dependently typed systems like Coq or Agda; 

• Supports both inductive and co-inductive interpretations. 

Fegaras and Sheard [14] were the first to suggest representing 
cyclic structures using binders. However, their mixed-variant type 
representation has several drawbacks that are discussed in detail by 
Ghani et al. [15]. The most important drawbacks, which we sum- 
marize here, are 1) their Haskell-based representation does not pre- 
vent misuses of binders and variables and there are various ways 
to create junk terms; 2) the representation forces unrolling the cy- 
cles for most operations, which significantly reduces the usefulness 
of the approach for preserving sharing; and 3) the representation is 
problematic for use in dependently typed languages like Coq or 
Agda, which forbid mixed- variant types. To prevent junk, Fegaras 
and Sheard propose a special-purpose type-system. This is in con- 
trast to our PHOAS based approach, which relies in parametricity 
instead. Nevertheless the idea of using a placeholder constructor for 
variables, which (in our own representation) corresponds to Var, 
was first used in their approach. This placeholder constructor is im- 
portant to avoid the definition of inverse functions that arise when 
defining functions with classic HOAS approaches to binding [27] 
(see also the discussion in Section 2). 



Ghani et al. [15] suggest an alternative to Fegaras and Sheard's 
binder representation that avoids mixed-variant types. However 
their approach does not support cross edges and it requires nested 
datatypes [6] (which are not supported in many programming lan- 
guages). The lack of support for cross edges is particularly limit- 
ing since cross edges are important for most graph structures (the 
exceptions are linear structures like streams). Like us, they also 
develop combinators and they sketch a datatype-generic program- 
ming variant of their graph library. However, the use of nested 
datatypes, complicates the definition of the generic combinators. 
Folds for cyclic stream and tree structures require higher-ranked 
types [32] (as usual for nested datatypes [5, 25]) and they suggest 
that in a datatype-generic version one-hole contexts [1] are also 
needed, adding extra complexity to the approach. 

Building on Ghani et al.'s work, Hamana [20] proposes an 
approach that deals with cross edges. However, this representation 
requires a dependently typed language like Agda or, alternatively, 
an encoding based on generalized algebraic datatypes [33]. To deal 
with cross edges Hamana uses a path-based approach, where the 
cross edges are expressed in terms of a relative path. For example 
the path expression j/ 11 means "go up to the node labelled x 
and then descent twice through the left". Dependent types are used 
to ensure that such paths are valid by keeping track of the shape of 
the structure in the types. In contrast our representation relies only 
on well-scoped labels to deal with cross edges. It is unclear to us 
that Hamana's representation extends to coinductive interpretations 
of cyclic structures, since this seems to require potentially infinite 
types to model the shapes of the structure at the level of types. 

Inductive representations of unstructured graphs A different 
line of research concerns inductive representations of graphs in a 
more classical sense: unstructured representations of nodes and 
edges with no constraints on the graph structure. Erwig [13] pro- 
poses an inductive representation with two constructors: an empty 
graph constructor (the base case); or a graph extended with a node 
together with its label and edges (the inductive case). Gibbons [16] 
proposes an initial algebra semantics for unstructured (acyclic) 
graphs, but he requires 6 different types of constructors for captur- 
ing various possible configurations of nodes and edges. In contrast 
to structured graphs, this unstructured view does not impose strong 
constraints in their shape of graph structures and cannot be used to 
enforce constraints like: streams nodes have exactly one edge; or 
binary trees (Fork) nodes to have exactly two edges. 

Binding With respect to binding our work builds on Chlipala's [9] 
Parametric HOAS approach. In contrast to us, Chlipala does not 
discuss applications of PHOAS to cyclic structures nor encodings 
of recursive binders. Instead he is focused on the applications of 
PHOAS to theorem proving. There are several other approaches to 
binding [11, 14, 21, 39], which are closely related and influenced 
the development of PHOAS. However PHOAS unique combination 
of features (which we discussed in detail in Section 2) make this 
approach particularly attractive for representing binders. 

Other work Hughes proposes a functional programming lan- 
guage extension for lazy memo functions [22], This extension al- 
lows functions like map to preserve the sharing of their inputs. 
Because it is a language-based approach it is convenient and trans- 
parent to use. Using generic combinators it is possible to approx- 
imate similar convenience with structured graphs. However, the 
convenience of lazy memo functions does come at a price in terms 
of flexibility: it is not possible to define functions that require ex- 
plicit manipulation of cycles and sharing. 

Analysis and transformations on grammars have been a hot 
topic recently [4, 10, 12, 28]. The analysis and transformations pre- 
sented in Section 6 were inspired by Might et al. [28] work on Br- 
zozowski's [7] derivative of regular expressions. Might et al. use 



laziness, memoization and fixed points to allow simple definitions 
of operations on grammars and provide guarantees of termination. 
However pointer equality is used in the implementation of mem- 
oization. This precludes referential transparency and complicates 
reasoning. In contrast we exploit call-by-need for the same effect 
of memoization and due to our explicit representation of variables 
we can avoid pointer equality. 

8. Conclusion 

Functional programming languages have excellent mechanisms to 
program with tree structures, but graph structures have always been 
a challenge. While traditional imperative approaches can be used to 
work with graphs, many nice properties are lost. 

Structured graphs extend the nice mechanisms available in func- 
tional programming languages to graph structures. The purely func- 
tional nature of structured graphs means that conventional reason- 
ing techniques can be used to reason about graph structures. Ulti- 
mately, we believe that structured graphs offer a practical program- 
ming model for graph structures without giving up the benefits of 
functional programming. 
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